Rule-Based Sentence Detection Method (RBSDM) for Turkish
نویسندگان
چکیده
The first process of generating a corpus, which is a representative of the language, is the determination of sentences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.
منابع مشابه
Sentence Boundary Detection in Turkish
In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sente...
متن کاملEntropy Based Fuzzy Rule Weighting for Hierarchical Intrusion Detection
Predicting different behaviors in computer networks is the subject of many data mining researches. Providing a balanced Intrusion Detection System (IDS) that directly addresses the trade-off between the ability to detect new attack types and providing low false detection rate is a fundamental challenge. Many of the proposed methods perform well in one of the two aspects, and concentrate on a su...
متن کاملOnline Fault Detection and Isolation Method Based on Belief Rule Base for Industrial Gas Turbines
Real time and accurate fault detection has attracted an increasing attention with a growing demand for higher operational efficiency and safety of industrial gas turbines as complex engineering systems. Current methods based on condition monitoring data have drawbacks in using both expert knowledge and quantitative information for detecting faults. On account of this reason, this paper proposes...
متن کاملFault Detection of Bearings Using a Rule-based Classifier Ensemble and Genetic Algorithm
This paper proposes a reduct construction method based on discernibility matrix simplification. The method works with genetic algorithm. To identify potential problems and prevent complete failure of bearings, a new method based on rule-based classifier ensemble is presented. Genetic algorithm is used for feature reduction. The generated rules of the reducts are used to build the candidate base...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کامل